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Abstract 

Solomonoff unified Occam’s razor and Epicurus’ principle of multiple expla¬ 
nations to one elegant, formal, universal theory of inductive inference, which 
initiated the field of algorithmic information theory. His central result is 
that the posterior of the universal semimeasure M converges rapidly to the 
true sequence generating posterior p, if the latter is computable. Hence, M 
is eligible as a universal predictor in case of unknown p. The first part of 
the paper investigates the existence and convergence of computable universal 
(semi)measures for a hierarchy of computability classes: recursive, estimable, 
enumerable, and approximable. For instance, M is known to be enumerable, 
but not estimable, and to dominate all enumerable semimeasures. We present 
proofs for discrete and continuous semimeasures. The second part investi¬ 
gates more closely the types of convergence, possibly implied by universality: 
in difference and in ratio, with probability 1, in mean sum, and for Martin- 
Lof random sequences. We introduce a generalized concept of randomness for 
individual sequences and use it to exhibit difficulties regarding these issues. 
In particular, we show that convergence fails (holds) on generalized-random 
sequences in gappy (dense) Bernoulli classes. 

Keywords 

Sequence prediction; Algorithmic Information Theory; Solomonoff’s prior; 
universal probability; mixture distributions; posterior convergence; com¬ 
putability concepts; Martin-Lof randomness. 
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1 Introduction 

All induction problems can be phrased as sequence prediction tasks. This is, for 
instance, obvious for time-series prediction, but also includes classihcation tasks. 
Having observed data Xt at times t <n, the task is to predict the t-th symbol Xt 
from sequence x = x\...xt-i. The key concept to attack general induction prob¬ 
lems is Occam’s razor (simplicity) principle, which says that Entities should not 
be multiplied beyond necessity.” and to a less extent Epicurus’ principle of multiple 
explanations. The former/latter may be interpreted as to keep the simplest/all the¬ 
ories consistent with the observations Xi...Xt-i and to use these theories to predict 
Xt- Kolmogorov (and others) dehned the complexity of a string as the length of its 
shortest description on a universal Turing machine. The Kolmogorov complexity 
K is an excellent universal complexity measure, suitable for quantifying Occam’s 
razor. There is (only) one disadvantage: K is not computable. 

More precisely, a function / is said to be recursive (or finitely computable) if 
there exists a Turing machine that, given x, computes f{x) and then halts. Some 
functions are not recursive but still approximable (or limit-computable) in the sense 
that there is a nonhalting Turing machine with an inhnite (x-dependent) output 
sequence i/i,|/ 2 ,|/ 3 ,... and \\mt^ooyt = f{x). If additionally the output sequence is 
monotone increasing/decreasing, then / is said to be lower/upper semicomputable 
(or enumerable/ CO- enumerable). Finally we call / estimable if some Turing machine, 
given X and a precision e, hnitely computes an ^-approximation of x. The major 
algorithmic property of K is that it is co-enumerable, but not recursive. 

More suitable for predictions is Solomonoff’s universal prior M(x) 

dehned as the probability that the output of a universal monotone Turing machine 
U starts with string x when provided with fair coin hips on the input tape. M(x) 
is enumerable and roughly hence implementing Occam’s and also Epicurus’ 

principles. 

Assume now that strings x are sampled from a probability distribution /i, i.e. the 
probability of a string starting with x shall be fi{x). The probability of observing x* 
at time t, given past observations xi...xt-i is /r(xi|xi...xt_i)=/i(xi...xt)/p.(xi...xt_i). 
Solomonoh’s [?CTf8| central result is that the universal posterior M{xt\xi...xt-i) = 
M{xi...Xt)/M{xi...Xt-i) converges rapidly to the true (objective) posterior probabil¬ 
ity /i(xt|xi...xt_i), if fi is an estimable measure, hence M can be used for predictions 
in case of unknown jj,. One representation of M is as a 2“^^^^-weighted sum of all 
enumerable “defective” probability measures, called semimeasures. The (from this 
representation obvious) dominance M(x) >2“^^^)/i(x) for all enumerable /i is the 
central ingredient in the convergence proof. 

Dominance and convergence immediately generalize to arbitrary weighted sums 
of (semi)measures of some arbitrary countable set Ai. So what is so special about 
the class of all enumerable semimeasures Ail//// The larger we choose Ai. the 
less restrictive is the essential assumption that Ai should contain the true distribu¬ 
tion /i. Why not restrict to the still rather general class of estimable or recursive 
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(semi)measures? For every countable class Ai and ^x(x) := with 

Wi,>0, the important dominance >Wyi'{x)'ii'& M. is satisfied. The ques¬ 

tion is what properties possesses. The distinguishing property of A^eXm is 
that M = i® ifs^lf element of fh® other hand, for prediction, 

G is not by itself an important property. What matters is whether is com¬ 
putable (in one of the senses we defined above) to avoid getting into the (un)realm 
of non-constructive math. 


Our first contribution is to classify the existence of generalized computable 
(semi)measures. From jZL70j we know that there is an enumerable semimeasure 
(namely M) that dominates all enumerable semimeasures in Alen’um- We show 
that there is no estimable semimeasure that dominates all recursive measures (also 
mentioned in [ZITOj i. and there is no approximable semimeasure that dominates all 
approximable measures. From this it follows that for a universal (semi)measure that 
at least satisfies the weakest form of computability, namely being approximable, the 
largest dominated class among the classes considered in this work is the class of enu¬ 
merable semimeasures. This is the distinguishing property of and M. This 

investigation was motivated by recent generalizations of Kolmogorov complexity and 
Solomonoff’s prior by Schmidhuber |Schnni I^chn2j . 


The second contribution is to investigate more closely the types of convergence, 
possibly implied by universality: in difference and in ratio, with probability 1, in 
mean sum, and for Martin-Lof random sequences. We introduce a generalized con¬ 
cept of randomness for individual sequences and use it to exhibit difficulties regard¬ 
ing these issues. More concretely, we consider countable classes A1 of Bernoulli 
environments and show that converges to p on all generalized random sequences 
if and only if the class is dense. 


Contents. In Section |2] we review various computability concepts and discuss their 
relation. In Section |21 we define the prefix Kolmogorov complexity K, the concept of 
(semi)measures, Solomonoff’s universal prior M, and explain its universality. Sec¬ 
tion |3] summarizes Solomonoff’s major convergence result, discusses general mixture 
distributions and the important universality property - multiplicative dominance. 
In Section 0 we define seven classes of (semi)measures based on four computability 
concepts. Each class may or may not contain a (semi)measures that dominates all 
elements of another class. We reduce the analysis of these 49 cases to four basic 
cases. Domination (essentially by M) is known to be true for two cases. The other 
two cases do not allow for domination. In Section [7| we investigate more closely the 
type of convergence implied by universality. We summarize the result on posterior 
convergence in difference ('C~/^~^0) and improve the previous result jLV97j on the 
convergence in ratio ^//i —1 by showing rapid convergence without use of martin¬ 
gales. In Section |H1 we investigate whether convergence for all Martin-Lof random 
sequences could hold. We define a generalized concept of randomness for individual 
sequences and use it to show that proofs based on universality cannot decide this 
question. Section El concludes the paper. 
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Notation. We denote strings of length n over finite alphabet X by x = xiX 2 ---Xn 
with xt E X and further abbreviate xi:n := xiX 2 ...Xn-iXn and x^n '■= xi...Xn-i, e 
for the empty string, i{x) for the length of string x, and u = Xi:oo for inhnite se¬ 
quences. We write xy for the concatenation of string x with y. We abbreviate 
hm„^oo[/(R)— 5 '(r)] = 0 by f{n)"-^g{n) and say / converges to g, without imply¬ 
ing that hm„^oofl'(R) itself exists. We write f{x)>g{x) for g{x) = 0{f{x)), i.e. if 
3c>0:/(a;) >cg{x)\/x. 


2 Computability Concepts 


We dehne several computability concepts weaker than can be captured by halting 
Turing machines. 

Definition 1 (Computable functions) We consider functions f: IN ^ M: 

f is recursive or hnitely computable iff there are Turing machines T 1/2 with 
output interpreted as natural numbers and = 

f is approximable or limit-computable iff 3 recursive with \imt^ac4>{x,t) = 

fix)- 

f is enumerable or lower semicomputable iff additionally (f){x,t) <(l){x,t + l). 
f is co-enumerable or upper semicomputable iff [—/] is lower semicomputable. 
f is semicomputable iff f is lower- or upper semicomputable. 
f is estimable iff f is lower- and upper semicomputable. 


If / is estimable we can hnitely compute an e-approximation of / by upper and 
lower semicomputing / and terminating when differing by less than e. This means 
that there is a Turing machine which, given x and e, hnitely computes y&Q such 
that \y — f{x)\ <e. Moreover it gives an interval estimate f{x) G [y — e,y-\-e]. An 
estimable integer-valued function is recursive (take any s<\). Note that if / is only 
approximable or semicomputable we can still come arbitrarily close to f{x) but we 
cannot devise a terminating algorithm that produces an e-approximation. In the 
case of lower/upper semi computability we can at least hnitely compute lower/upper 
bounds to fix). In case of approximability, the weakest computability form, even 
this capability is lost. 


recursive= 

finitely 

computable 


estimable 


enumerable= 
lower semi¬ 
computable 

co-enumerable= 
upper semi¬ 
computable 


semi¬ 

computable 


approximable= 

limit-computable 
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What we call estimable/recursive/finitely computableis often just called computable, 
but it makes sense to separate the concepts in this work, since finite computability 
is conceptually easier and some previous results have only been proved for this case. 
Sometimes we us the word computable generically for some of the computability 
forms of Dehnition ^ 

3 The Universal Prior M 

The prefix Kolmogorov complexity K{x) is defined as the length of the shortest bi¬ 
nary (prefix) program pG {0,1}* for which a universal prehx Turing machine U (with 
binary program tape and Xaij output tape) outputs string x&X*, and similarly 
K{x\y) in case of side information y |Kol651 ILev741 IHacTdl ICha75j : 

K{x) = min{£(p) : U{p) = x}, K{x\y) = min{£(p) : U{p,y) = x} 

Solomonoff |Sol64[ Eq.(7)] defined (earlier) the closely related quantity, the universal 
posterior M{y\x) = M[xy)/M[x). The universal prior M(x) can be defined as the 
probability that the output of a universal monotone Turing machine U starts with 
X when provided with fair coin flips on the input tape. Formally, M can be defined 
as 

M(x) := ^ 2-^^^ (1) 

p : U{p)=x^ 

where the sum is over minimal programs p for which U outputs a string starting 
with X. The so-called minimal programs are defined similarly to the prehx programs, 
but U need not to halt, which is indicated by the *. Minimal programs are those 
which are left to the input head in the moment when U wrote the last bit of x 
[LV971 IHut04j . Before we can discuss the stochastic properties of M we need the 
concept of (semi)measures for strings. 

Definition 2 (Continnous (Senii)measnres) /i(x) denotes the probability that a 
sequence starts with string x. We call fi>0 a (continuous) semimeasure if 
and o,nd a (probability) measure if equalities hold. 

The reason for calling pi with the above property a probability measure is that it 
satishes Kolmogorov’s axioms of probability in the following sense: The sample space 
is with elements = being inhnite sequences over alphabet X. 

The set of events (the a-algebra) is dehned as the set generated from the cylinder 
sets Ta-j.^ := {a;: = Xi:„} by countable union and complement. A probability 

measure /i is uniquely dehned by giving its values on the cylinder sets, 

which we abbreviate by p{xi,n). We will also call p a measure, or even more loose 
a probability distribution. 

We have < Af(x) because there are programs p that output x, not 

followed by any aG A. They just stop after printing x or continue forever without 
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any further output. Together with M(e) = 1 this shows that M is a semimeasure, 
but not a probability measure. We can now state the fundamental property of M 

[ZLTOl [83178] : 

Theorem 3 (Universality of M) The universal prior M is an enumerable 
semimeasure that multiplicatively dominates all enumerable semimeasures in the 
sense that M{x) > ■ p{x) for all enumerable semimeasures p. M is enu¬ 

merable, but not estimable (nor recursive). 

The Kolmogorov complexity of a function like p is defined as the length of the 
shortest self-delimiting code of a Turing machine computing this function in the 
sense of Dehnition^ Up to a multiplicative constant, M assigns higher probability 
to all X than any other computable probability distribution. 

It is possible to normalize M to a true probability measure Mnorm fSd71lLV97j 
with dominance still being true, but at the expense of giving up enumerability 
{Mnorm IS still approximable). M is more convenient when studying algorithmic 
questions, but a true probability measure like Mnorm is more convenient when study¬ 
ing stochastic questions. 


4 Universal Sequence Prediction 

In which sense does M incorporate Occam’s razor and Epicurus’ principle of multiple 
explanations? Since the shortest programs p dominate the sum in M, M{x) is 
roughly equal to = i.e. M assigns high probability 

to simple strings. More useful is to think of x as being the observed history. We 
see from m that every program p consistent with history x is allowed to contribute 
to M (Epicurus). On the other hand, shorter programs give signihcantly larger 
contribution (Occam). How does all this affect prediction? If M{x) describes our 
(subjective) prior belief in x, then M{y\x) :=M{xy)/M{x) must be our posterior 
belief in y. From the symmetry of algorithmic information K{xy) k,K {y\x) + K{x), 
and M{x)!^2~^^^1 and M{xy)^2~^^^yl we get M {y\x) This tells us that 

M predicts y with high probability iff y has an easy explanation, given x (Occam & 
Epicurus). 

The above qualitative discussion should not create the impression that M{x) 
and always lead to predictors of comparable quality. Indeed, in the on¬ 

line/incremental setting, K{y) = 0{l) invalidates the consideration above. The proof 
of 0 below, for instance, depends on M being a semimeasure and the chain rule 
being exactly true, neither of them is satished by See jHutOdbj for a detailed 

analysis. 

Sequence prediction algorithms try to predict the continuation x* G T of a given 
sequence Xi...Xt-i. The following bound shows that M predicts computable se¬ 
quences well; 
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oo oo 

Y^{l-M{xt\x<t)f < -\'^\TiM{xt\x<t) = -ilnM(xi:oo) < ■ Km{xi.,^), 

t=i t=i 

( 2 ) 

where the monotone complexity Km{xi,oo) = min{t'(p) : U{p) =Xi:oo} is dehned as 
the length of the shortest (nonhalting) program computing Xi:oo |ZL7ni ILev73j . In 
the hrst inequality we have used (1 —a)^ < —^Ina for 0 < a < 1. In the equality 
we exchanged the sum with the logarithm and eliminated the resulting product by 
the chain rule. In the last inequality we used M{x) which follows from 

m by dropping all terms in J2p except for the shortest p computing x. If xi,oo 
is a computable sequence, then Km{xi-,oo) is hnite, which implies M[xt\x^t) —^ 1 

This means, that if the environment is a computable 
sequence (whichsoever, e.g. the digits of tt or e in Xaxy representation), after having 
seen the hrst few digits, M correctly predicts the next digit with high probability, 
i.e. it recognizes the structure of the sequence. 

Assume now that the true sequence is drawn from a computable probability 
distribution /i, i.e. the true (objective) probability of xi-,t is p{xi:t). The probability 
of Xt given a;<t hence is p{xt\xct) = ■ Solomonoff’s |Sol78j central result 

is that M converges to p. More precisely, for binary alphabet, he showed that 

l^{x<t)[M{0\x<_t) - p{0\x<t)) < |ln2-77(/i) + 0(1) < cx). (3) 

i=i x<te{o,i}*-i 

The inhnite sum can only be hnite if the diherence M(0|a;<t) —/i(0|a;<t) tends to 
zero for t^oo with /^-probability 1 (see Dehnition rmf U and [HutDI j or SectionOfor 
general alphabet). This holds for any computable probability distribution p. The 
reason for the astonishing property of a single (universal) function to converge to 
any computable probability distribution lies in the fact that the set of /x-random 
sequences diher for diherent p. Past data x^t are exploited to get a (with t oo) 
improving estimate M{xt\x^t) of p{xt\x^t)- 

The universality property (Theorem El) is the central ingredient in the proof of 
©. The proof involves the construction of a semimeasure ^ whose dominance is 
obvious. The hard part is to show its enumerability and equivalence to M. Let Af 
be the (countable) set of all enumerable semimeasures and dehne 

e(l) := i: 2-*>V(l). (4) 

v£M 

Then dominance 

i{x) >2~^^''^i>{x) W u E M. (5) 

is obvious. Is ^ lower semicomputable? To answer this question one has to 
be more precise. Levin [zITtH] has shown that the set of all lower semicom¬ 
putable semimeasures is enumerable (with repetitions). For this (ordered multi) 
set A4=A4l^}^:={i'i, 1 ^ 2 X 3 ,■■■} and K{y.i)\= K{i) one can easily see that ^ is lower 
semicomputable. Finally proving M{x)>E{x) also establishes universality of M (see 
[S^fLWT] for details). 
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The advantage of ^ over M is that it immediately generalizes to arbitrary 
weighted snms of (semi)measnres for arbitrary conntable M.. 

5 Universal (Semi)Measures 

What is so special abont the set of all ennmerable semimeasnres The larger 

we choose Ai the less restrictive is the assnmption that At shonld contain the trne 
distribntion p, which will be essential thronghont the paper. Why do not restrict 
to the still rather general class of estimable or recursive (semi)measures? It is clear 
that for every countable (multi)set Ai, the universal or mixture distribution 

:= : = E Wyi'{x) with Wy < 1 and Q (6) 

v&M i^eM 

dominates all u&At. This dominance is necessary for the desired convergence > /i 
similarly to 0 . The question is what properties ^ possesses. The distinguishing 
property of AfgXm is that ^ is itself an element of A4l^^. When concerned with 
predictions, G Af is not by itself an important property, but whether ^ is com¬ 
putable in one of the senses of Dehnition ^ We dehne 

Ail > Ai 2 there is an element of AIi that dominates all elements of Ai 2 

dpG All VpG Al 2 3tCiy>0 Vx : p{x)>Wi^u{x). 

> is transitive (but not necessarily reflexive) in the sense that AIi^Al 2 ^Al 3 implies 
Ali>Al 3 and Alo 5 All ^ Al 2 ^ AI 3 implies Alo^Als- For the computability 
concepts introduced in Section |21 we have the following proper set inclusions 


J^msr 

c 

MZT 

= 

— '-enum 

c 

Jyimsr 
^ appr 

n 


n 

n 


n 

s^semi 

^rec 

c 

Kyi semi 
^est 

y- KAsemi 

^ ^enum 

c 

KAsemi 
^ appr 


where Ai^^^ stands for the set of all probability measures of appropri¬ 
ate computability type c G {rec=recursive, est=estimable, enum=enumerable, 
appr=approximable}, and similarly for semimeasures From an enumera¬ 

tion of a measure p one can construct a co-enumeration by exploiting p(xi:„) = 1 — 
■ This shows that every enumerable measure is also co-enumerable, 
hence estimable, which proves the identity = above. 

With this notation. Theorem El implies Afe®™^ > Transitivity allows 

to conclude, for instance, that Al^p™ —AI^c^, i.e. that there is an approximable 
semimeasure that dominates all recursive measures. 

The standard “diagonalization” way of proving Ali^Al 2 is to take an arbitrary 
pEAii and “increase” it to p such that p^p and show that pGAl 2 - There are 7x7 
combinations of (semi)measures Afi with Al 2 for which Ali>Al 2 could be true or 
false. There are four basic cases, explicated in the following theorem, from which 
the other 49 combinations displayed in Table El follow by transitivity. 


Computable Universal Priors 


9 


Theorem 4 (Universal (semi)measures) A semimeasure p is said to be uni¬ 
versal for A4 if it multiplicatively dominates all elements of M. in the sense 
yu3wu > 0: p{x) > WyV^xyix. The following holds true: 

o) 3p : {p}^M.: For every countable set of (semi)measures M., there is a 
(semi)measure that dominates all elements of AA. 

i) -A^en’um —^hc class of enumerable semimeasures contams a universal 
element. 

a) There is an approximable measure that dominates all enu¬ 

merable semimeasures. 

Hi) There is no estimable semimeasure that dominates all re¬ 

cursive measures. 

iv) There is no approximable semimeasure that dominates all 

approximable measures. 


Table 5 (Existence of universal (semi)measures) The entry in row r and col¬ 
umn c indicates whether there is an r-able (semi)measure p dominating the set 
Ai that contains all c-able (semi)measures, where r,c& {recurs, estimat, enumer, 
approxim}. Enumerable measures are estimable. This is the reason why the enum. 
row and column in case of measures are missing. The superscript indicates from 
which part of Theorem^ the answer follows. For the bold face entries directly, for 
the others using transitivity of >. 


\ 

M 

semimeasure 

measure 

p 

\ 

rec. 

est. 

enum. 

appr. 

rec. 

est. 

appr. 

s 

rec. 

nQiii 

no'" 

no'" 

no'" 

no'" 

no'" 

no'" 

e 

est. 

nQiii 

no'" 

no'" 

no'" 

no*” 

no'" 

no'" 

m 

enum. 

yes'' 

yes' 

yes* 

no'" 

yes' 

yes' 

no'" 

i 

appr. 

yes' 

yes' 

yes' 

no'" 

yes' 

yes' 

no*'" 

m 

rec. 

no'" 

no'" 

no'" 

no'" 

no'" 

no'" 

no'" 

s 

est. 

no'" 

no'" 

no'" 

no'" 

no'" 

no'" 

no'" 

r 

appr. 

yes" 

yes" 

yes” 

no'" 

yes" 

yes" 

no'" 


If we ask for a universal (semi)measure that at least satisfies the weakest form of 
computability, namely being approximable, we see that the largest dominated set 
among the 7 sets dehned above is the set of enumerable semimeasures. This is 
the reason why plays a special role. On the other hand, AilZum is not 

the largest set dominated by an approximable semimeasure, and indeed no such 
largest set exists. One may, hence, ask for “natural” larger sets AA. One such set. 
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namely the set of cumulatively enumerable semimeasures AIcem, has recently been 
discovered by Schmidhuber jSchOni I^chn2j . for which even ^cem^AIcem holds. 
Theorem 0] also holds for discrete (semi)measures P dehned as follows: 

Definition 6 (Discrete (semi)measures) P{x) denotes the probability of x & IN. 
We ca//P: IV—>• [0,1] a discrete (semi)measure if Y,x£ 1 nP{^)^— 

TheoremE](i) is Levin’s major result |LV971 Thm.4.3.1 & Thm.4.5.1], and {ii) is due 
to Solomonoff jSol78j . The proof of in jLV971 p249] contains minor 

errors and is not extensible to {iii), and the proof in |LV971 p276] only applies to 
inhnite alphabet and not to the binary/hnite case considered here. 
is mentioned in jZL70j without proof. A direct proof of [iv) can be found in |Hnt04j . 
Here, we reduce {iv) to {Hi) by exploiting the following elementary fact (well-known 
for integer-valued functions, see e.g. |Sim771 p634]): 

Lemma 7 (Approximable = P-estimable) A function is approximable iff it is 
estimable with the help of the halting oracle. 


Proof. With P-computable we mean, computable with the help of the halting 
oracle, or equivalently, computable under extra input of the halting sequence h = 
hi:cx) G {0,1}°°, where hn = l U{n) halts. 

Assume / is approximable, i.e. 'ie3y,m\ R{m,y,e), where relation R{m,y,e) := 
[\/n>m:\fn{x)—y\<e] and recursive fn^f- Fixe:>0. Search (dovetail) for mGlV 
and y (g is sufficient) such that R{m,y,e) =true. R is co-enumerable, hence H- 
decidable, hence y can be P-computed, hence / is P-estimable, since f{x) = yPO{e). 

Now assume that / is P-estimable, i.e. 3T gTM \/e,x : \T{x,e,h) — f{x)\ <e. 
Since h is co-enumerable, T and hence / are approximable. More formally, let 
hf = l U{n) halts within t steps. Then g{x,e) := T{x,e,h) = T{x,efimt^ooh^) = 
limt^ooT{x,e,N) is approximable, where the exchange of limits holds, since T only 
reads nxe<oo bits of h and hi.,n^^ = h\.^^^ for sufficiently large t. □ 


6 Proof of Theorem IH 

We hrst prove the theorem for discrete (semi)measures P (Dehnition IHl), since it 
contains the essential ideas in a cleaner form. We then present the proof for con¬ 
tinuous (semi)measures /i (Dehnition 12)). We present proofs for binary alphabet 
A = {0,1} only. The proofs naturally generalize from binary to arbitrary hnite al¬ 
phabet. argmin 3 ./(a:) is the x that minimizes f{x). Ties are broken in an arbitrary 
but computable way (e.g. by taking the smallest x). 

Proof (discrete case). 

(o) Q{x) ■=J2peM'^pPi.^) with t(;p>0 obviously dominates all Pg Af (with constant 
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wp). With YjpWp = l and all P being discrete (senii)nieasures also Q is a discrete 
(senii)nieasure. 

(i) See |TW7l Thni.4.3.1]. 

(ii) Let P be the universal element in and o:: = J2xP{^)- We normalize P by 

Q{x):=^P{x). Since a<l we have Q{x)>P{x). Hence Q>P>M.f^l^. As a ratio 
between two enumerable functions, Q is still approximable, hence 

(iii) Let PeM.^l'^\ We partition IN into chunks := —1} (n>l) of 

increasing size. With x„: = argmin 3 ;e 7 -^P(a;) we dehne Q{xn) ■= and Q(x) :=0 

for all other x. Exploiting that a minimum is smaller than an average and that /i is 
a semimeasure, we get 


P{xn) 


minP(x) < 

xeir. 


rrr L £ 



1 _ n{n + l) 

2^—1 2 ^“^ \ TL 


Since —>-0 for n —>• cx), P cannot dominate Q (P^Q). With P also Q is 

recursive. Since P was an arbitrary recursive semimeasure and Q is a recursive 
measure (EQ(2:) = =E[^= 1) this implies 

Assume now that there is an estimable semimeasure We construct a 

recursive semimeasure P^S* as follows. Choose an initial £>0 and hnitely compute 
an ^-approximation S of S{x). If S>2e dehne P(x): = |5, else halve £ and repeat the 
process. Since S{x)>0 (otherwise it could not dominate, e.g. T(x)G 

the loop terminates after hnite time. So P is recursive. Inserting S = 2P{x) and e< 
^S = P{x) into |S'(x)—^1 <£ we get |S'(x)—2P(x)| <P(x), which implies S'(x)>P(x) 
and S'(x) <3P(x). The former implies X]a;-P(^) <Sa;>S'(x) <1, i.e. P is a semimeasure. 
The latter implies P>|S'>A1™®^. Hence P is a recursive semimeasure dominating 
all recursive measures, which contradicts what we have proven in the hrst half of 
(in). Hence the assumption on S was wrong which establishes AlgsP^AI^c^. 

(iv) From [Hi) we know that ^AiTsT■ The proof and hence result 

remains valid under the halting oracle, i.e. Al|f ™4 By Lemma [3 

the P-estimable functions/(semi)measures coincide with the approximable func¬ 
tions/(semi) measures, hence ^■^'appr- 

Proof (continuous case). 

The major difference to the discrete case is that one also has to take care that 

p(x) p(x0)-|-p(xl), xG {0,1}*, is respected. On the other hand, the chunking 
/„:={0,1}"' is more natural here. 

(o) p(x) with w^>0 obviously dominates all u E A4 (with domi¬ 
nation constant w,^). With = 1 and all z/ being (semi)measures also p is a 

(semi) measure. 


(i) See [TVOTI Thm.4.5.1]. 
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(ii) Let ^ be a universal element in W^e define jSol78j 

Knorm\X^:n) • U + e(a:<A) ' 

By induction one can show that ^nmm is a measure and that ^norm(3^) hence 

a ratio of enumerable functions, ^norm is still approximable, 

hence 

(iii) Analogous to the discrete case we could start by recursively defining xl := 
a.Tgmmxi^fi{x*^f.Xk) for /xGAl^g™. See jHutO.Saj for a proof along this line. Simpler is 
to directly consider fi G Afand to compute x\.^ recursively by computing some 
^-approximation e{xk\x*^^) of ^{xk\x*^^) and define = argmaXj,j.e(a;fc|a;<J, which 
implies /r(x^|x^J < \+e. Finally we define measure p by p{x\.^) = Vik and p{x) = 0 
for all X that are not prefixes of Hence p{x\.^ < (i+£)" = (|+e)”p(x^.„), which 
demonstrates that p does not dominate p for £<i. Since pGAfg®™ was arbitrary 
and p is a recursive measure, this implies Algg^^Af™*^. 

(iv) Identical to discrete case. □ 

7 Posterior Convergence 

We investigated in detail the computational properties of various mixture distri¬ 
butions A mixture multiplicatively dominates all distributions in Af. We 
mentioned that dominance implies posterior convergence. In this section we present 
in more detail what dominance implies and what not. 

Convergence of ^(xt|x<t) to p(xt|x<t) with p-probability 1 tells us that ^{xt\x^t) 
is close to p(xt|x<t) for sufficiently large t on ‘most’ sequences Xi:oo- If says nothing 
about the speed of convergence, nor whether convergence is true for any particular 
sequence (of measure 0). Convergence in mean sum defined below is intended to cap¬ 
ture the rate of convergence, Martin-Lof randomness is used to capture convergence 
properties for individual sequences. 

Martin-Lof randomness is a very important concept of randomness of individ¬ 
ual sequences, which is closely related to Kolmogorov complexity and Solomonoff’s 
universal prior. Levin gave a characterization equivalent to Martin-Lof’s original 
definition 

Theorem 8 (Martin-Lof random seqnences) A sequence Xi-^o A p-Martin-Lof 
random (p.M.L.) iff there is a constant c such that M[xi-,n) <c-p(xi:„) for all n. 

An equivalent formulation for estimable p is: 

Xi:oo is p.M.L.-random Am(xi;„) = — logp(xi;„) -|- 0(1) 'in (7) 

Theorem IHl follows from o by exponentiation, “using 2 M” and noting that 
M>p follows from universality of M. Consider the special case of p being a fair 
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coin, i.e. /x(xi:n) = 2“"', then a;i:oo is M.L. random iff Km{xi:n)=n+0{1), i.e. if xi:„ is 
incompressible. For general /i, —\ogfi{xi:n) is the length of the Shannon-Fano code 
of xi:n, hence xi:oo is p.M.L.-random iff the Shannon-Fano code is optimal. 

One can show that a /i.M.L.-random sequence Xi-oo passes all thinkable effective 
randomness tests, e.g. the law of large numbers, the law of the iterated logarithm, 
etc. In particular, the set of all /i.M.L.-random sequences has /i-measure 1. The 
following generalization is natural when considering general Bayes mixtures ^ as in 
this work: 

Definition 9 (/i/,^-randoni seqnences) A sequence xi:oo is called fj,/(,-random 
(fif.r.) iff there is a constant c such that (,{xi:n) <c-^{xi-,n) for all n. 

Typically, ^ is a mixture over some Ai as defined in (jU|), in which case the 
reverse inequality ({x)>fi{x) is also true (for all x). For finite Ai or if ^G At, the 
dehnition of /i/,^-randomness depends only on At, and not on the specihc weights 
Wy used in (. For AA = /i/,^-randomness is just /i.M.L.-randomness. The 

larger Af, the more patterns are recognized as nonrandom. Roughly speaking, those 
regularities characterized by some z/G Af are recognized by /r/.^-randomness, i.e. for 
Af some /r/.^-random strings may not be M.L. random. Other randomness 

concepts, e.g. those by Schnorr, Ko, van Lambalgen, Lutz, Kurtz, von Mises, Wald, 
and Church (see jWan961 ILam871 ISchTlj l. could possibly also be characterized in 
terms of /i/^-randomness for particular choices of Af. 

A classical (nonrandom) real-valued sequence a* is defined to converge to a*, 
short a*—if V£3toVt>to^ \ at — a^\<e. We are interested in convergence properties 
of random sequences Zt{ui) for t^oo (e.g. zt{uj) ■ We denote 

p-expectations by E. The expected value of a function f : ^ ]R, dependent 

on xi:t, independent of Xt+i:oo, and possibly undefined on a set of /i-measure 0, is 
E[/] = The prime denotes that the sum is restricted to Xi,* 

with fi{xi:t)f^0. Similarly we use P[..] to denote the /i-probability of event [..]. We 
define four convergence concepts for random sequences. 

Definition 10 (Convergence of random seqnences) Let zi{uj),Z 2 {uj),... be a 
sequence of real-valued random variables. Zt is said to converge for t ^ oo to (random 
variable) z^ 

i) with probability 1 (w.p.l) :-v^ 'P[{u}\Zt^zf\] = l, 

ii) in mean sum (i.m.s.) :-v^ S)TiE[( 2;4 — 2 ;*)^] < cx), 

Hi) for every fi- Martin- L of random sequence (yi.M.L.) :-<=> 

Vo;: //[3cVn:M(a;i:„) <c/i(a;i:„)] then Zt{uj)—>■ z^,{u) for t—> 00 , 

iv) for every pi/ (-random sequence (pi.(.r.) 

Vo;: If [3c\/n:({uJi:n) <cpi{u!i:n)] then Zt{uj) ^ z^{ui) fort^oo. 
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In statistics, {i) is the “default” characterization of convergence of random sequences. 
Convergence i.m.s. {ii) is very strong: it provides a rate of convergence in the sense 
that the expected number of times t in which zt deviates more than £ from 2;* is hnite 
and bounded by cje^ and the probability that the number of ^-deviations exceeds 
^ is smaller than <5, where c\=Yf^i^[{,Zt — z^Y]. Nothing can be said for which 
t these deviations occur. If, additionally, \zt — z^,\ were monotone decreasing, then 
\zt — z^ \ could be concluded, {iii) uses Martin-Lof’s notion of randomness 

of individual seqyiences to dehne convergence M.L. Since this work deals with general 
Bayes mixtures we generalized in {iv) the dehnition of convergence M.L. based on 
M to convergence based on in a natural way. One can show that convergence 
i.m.s. implies convergence w.p.l. Also convergence M.L. implies convergence w.p.l. 
Universality of ^ implies the following posterior convergence results: 


Theorem 11 (Convergence of ^ to /x) Let there he sequences XiX 2 --- over a fi¬ 
nite alphabet X drawn with probability ii{x\-,n) € M. for the first n symbols, where 
is a measure and A4 a countable set of (semi)measures. The universal/mixture 
posterior probability f,{xt\x^t) of the next symbol Xt given x<t is related to the true 
posterior probability pj{xt\x^t) in the following way: 








t=i 


2n 


< Intc,, ^ < CX) 


where Wfj, is the weight of fv in 


Theorem implies 


\J^{x't\x<t) n{x't\x<t) for any x[ and ^ —1, both i.m.s. for t 00 . 

The latter strengthens the result ^(xt|a:<t)//x(xt|x<i) —1 w.p.l derived by Gacs 
|LV97[ Thm.5.2.2] in that it also provides the “speed” of convergence. 

Note also the subtle difference between the two convergence results. For any se¬ 
quence x[.^ (possibly constant and not necessarily /i-random), /x(a:^|x<t)—^(x^|x<t) 
converges to zero w.p.l (referring to Xi:oo), but no statement is possible for 
.^(x^|x<t)//i(x^|x<t), since hminfp(xj|x<i) could be zero. On the other hand, if 
we stay on-sequence {x[.oo = xi:oo), we have ^(xt|x<t)//x(xt|x<t) —> 1 w.p.l (whether 
inf/x(xi|x<t) tends to zero or not does not matter). Indeed, it is easy to give an 
example where f{x[\x^t)/T{^t\^<t) diverges. If we choose 

A1 = {/ii,/i 2 }, /i = /ni, /ii(l|x<t) = and /i 2 (l|a;<t) = 

the contribution of /i 2 to ^ causes ^ to fall off like /X 2 ~^~^, much slower than p,r~^t~^ 
causing the quotient to diverge: 
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/^1 (Ol:n) 


Ai2(0l:n) 


n 

]^(l — Cl = 0.450... > 0 0i:oo is a /i-random sequence, 

t=i 


C2 = 0.358... > 0 ^ ^(0l:n) ^ WiCi + W 2 C 2 =: > 0 


t=i 


Wl/il(l|0<t)/ii(0<i) + W2/^2(l|0<t)/i2(0<i) 


^W2C2t ^ 




e(0<a) 

e(0<0 


W2C2 2 


^(l|0<t) ^ W2C2 
/^(l|0<t) C^ 


diverges. 


Proof. For a probability distribution i/j > 0 with = 1 cind a semi-distribution 
Zi>0 with and i = {1,...,A^}, the Hellinger distance h(^,F) 

is upper bounded by the relative entropy d{y,z) = J2iyd^'^^ (and Oln^ : = 0). This can 
be seen as follows: For arbitrary 0 < i/ < 1 and 0 < z < 1 we dehne 

f{y,z) := y\n^-{^-y/zf + z-y = 2yg{^JJJ^) 

with g{t) := — Int-|-t — 1 > 0. 

This shows />0, and hence J2if{yi,Zi) >0, which implies 

^ i/jin — - J2(Vyi - \/^)^ >5]l/i-5]^i>l-l = 0. 

i i i i 

The (conditional) /x-expectations of a function f are dehned as 

E[/] = Y! and Ei[/] := E[/|x<t] = y{xt\x<t)f{xi,t), 

xtSA* 

where Y! sums over all Xt or xi^t for which y{xi,t) 7^0. If we insert X = 

N= \X\, i = Xt, yi = fit■=y{xt\x^t), and Zi = ^t- = ^{xt\xct) into h and d we get (w.p.l) 


ht{x<t) := Exti^/Jh-VTt)^ < dt{x<t) ■= Ex* Ft In f = Et[ln g]. 

Taking the expectation E and the sum EtLi 'w® 


± EK(I<,)| = ± E|E,lln ^|] = Elln n ^1 

t=l t=l St i=i St 


E[ln 


yix\-,n) 

^ {Xl-.n) 


< Intc,, ^ 


( 8 ) 


where we have used E[Ei[..]] =E[..] and exchanged the t-sum with the expectation 
E, which transforms to a product inside the logarithm. In the last equality we have 
used the chain rule for y and Using universality .^(xi:„) >w^y{xi,n) yields the 
hnal inequality. Finally 


El 





0 

Xt 


Taking the expectation E and the sum Er=i nnd chaining the result with (jH} yields 
Theorem cn □ 
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8 Convergence in Martin-L6f Sense 

An interesting open question is whether ^ converges to /i (in difference or ratio) 
individually for all Martin-Lof random sequences. Clearly, convergence /i.M.L. 
may at most fail for a set of sequences with /i-measure zero. A convergence 
M.L. result would be particularly interesting and natural for Solomonoff’s universal 
prior M, since M.L. randomness can be defined in terms of M (see Theorem IS)). 
Attempts to convert the bounds in Theorem to effective /i.M.L.-randomness 
tests fail, since M{xt\x^t) is not enumerable. The proof of M/n—^l given in 
|LV971 Thm.5.2.2] and |VLnni Thm.lO] is incomplete.^ The implication “M(xi:„) < 
c-/x(xi:„)Vn^hm„^ooAf(a:i:„)/p(a;i:„) exists” has been used, but not proven, and is 
indeed generally wrong jHMOdj . Theorem |H1 only implies sup„M(xi:„)//i(xi:„) < cx) 
for M.L. random sequences Xi-^oo, and jDoo531 pp. 324-325] implies only that 
hm„^ooAf(xi:n)//.t(xi:„) exists w.p.l, and not /i.M.L. Vovk |Vov87j shows that for 
two estimable semimeasures fi and p and xi;oo being p and p M.L. random that 

£H ^<oo and £ ^ 

If M were estimable, then this would imply posterior M —>■ /i and M//i—> 1 for 
every /i.M.L.-random sequence Xi:oo 5 since every sequence is M.M.L. random. Since 
M is not estimable, Vovk’s theorem cannot be applied and it is not obvious how 
to generalize it. So the question of individual convergence remains open. More 
generally, one may ask whether P for every /x/^-random sequence. It turns 

out that this is true for some Af, but false for others. 

Theorem 12 (/i/,^-convergence of ^ to p) Let X = {0,1} he binary and Ale : = 
{pe: pe{l\x^t) =dWt, 9 eQ} be the set of Bernoulli(6) distributions with parameters 
9 eQ. Let Qd be a countable dense subset of [0,1], e.g. [0,l]nil^, and let ©g be a 
countable subset of [ 0 , 1 ] with a gap in the sense that there exist 0 < 6 'o< 6 'i<l such 
that [9o,9i]r]QG = {doA}, e.g. 0G = {i,^} or 0 g = ([0,|]U[i,l])n^. Then 

i) If xi-oo is p/^Me^ random with peMeD, then T{xt\xA, 

a) There are p G Ale^ and pAmbc random xi:oo for which 7 © 

p{xt\x<t) 

^The formulation of their theorem is quite misleading in general: “Let g be a positive recursive 
measure. If the length of y is fixed and the length of x grows to infinity, then M(y\x)/g{y\x)—>■ 1 
with g-probability one. The infinite sequences oj with prefixes x satisfying the displayed asymptotics 
are precisely /'=> ’ and the g-random sequences!' First, for off-sequence y convergence w.p.l 
does not hold {xy must be demanded to be a prefix of oj). Second, the proof of has gaps (see 
main text). Last, ‘=>’ is given without proof and is wrong |HM04| . Also the assertion in |LV97[ 
Thm.5.2.1] that S't :=E^^, —M(a:(|x<t))^ converges to zero faster than 1/t cannot be 

made, since St does not decrease monotonically jHutOlL Prob.2.7]. For example, for at:=l/Vi iit 
is a cube and 0 otherwise, we have hut aty^o{l/t). 
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Our original/main motivation of studying /r/^-randomness is the implication of The¬ 
orem ini that M jji cannot be decided from M being a mixture distribution or 
from the universality property (Theorem |2I) alone. Further structural properties 
of have to be employed. For Bernoulli sequences, convergence /i.^^g.r. is 

related to denseness of Ale. Maybe a denseness characterization of Alg((^^ can 
solve the question of convergence M.L. of M. The property M G Afg®™^ is also not 

sufficient to resolve this question, since there are A1 3 ^ for which ^ fi and A1 3 ^ 
for which S, Theorem [T21 can be generalized to i.i.d. sequences over general 
hnite alphabet A. 

The idea to prove (ii) is to construct a sequence Xi-^oc that is /.^-random and 
h 6 »i/'C-random for 9q^9i. This is possible if and only if 0 contains a gap and 9q and 
9i are the boundaries of the gap. Obviously ^ cannot converge to 9q and 9i, thus 
proving non-convergence. For no 0G [0,1] will this Xi:oo be fie M.L.-random. Finally, 
the proof of Theorem IT^ makes essential use of the mixture representation of as 
opposed to the proof of Theorem ITTI which only needs dominance ^^Al. 

An example for {ii) is A 1 = /io(lk<t) = fii{ 0 \x<t) = xi-,^ = (01)°° = 

01010101 ... ^ iao{xi:2n) = 9‘iixi:2n) = ^{xi:2n) = (jTijT ^ U /io/Arandom 

and /ii/Arandom, but lJ,o{x2n\x<2n) = /io(a^2n+lkl:2n) = f, I^l{x2n\x<2n) = |, 
I^l{x2n+l\xi:2n) = J and ^{X2n\x<2n) = |, ^{x2n+l\Xl:2n) = | ioT Wq = Wi = ^ => 
ho/l(^n|^<n)• 

Proof. Let X = {0,1} and A1 = {fie'-9&Q} with countable 0C [0,1] and /i 0 (l|xi:„) = 
6 * = 1 —/i£)( 0 |xi;„), which implies 


fJ>e{xi-n) — 9^^{1 — 9)^ "b ui Xi-f-... -bx^, 9 = 9n ■— — 

n 

9 depends on n; all other used/dehned 9 will be independent of n. We assume 9.. G0, 
where .. stands for some (possible empty) index, and 6*G [0,1] (possibly ^0), where" 
stands for some superscript, i.e. /ig. and wq,, make sense, whereas /Xg and Wg do not. 
^ is dehned in the standard way as 

i{xi,n) = ^W0^ie{Xl,n) i{xi.,n) >We^JLe{Xi:n), (9) 

0G0 

where Y.B'x>e = ^ and WQ>Q'i9. In the following let fi = be the true environ¬ 

ment. 

a; = xi:oo is/i/^random ■ ^{xi-.n) < c^-9^eo{xi-.n)'^n (10) 

For binary alphabet it is sufficient to establish whether ,^(l|xi;„) A7^6*o=/i(l|xi:„) for 
/i/Arandom xi:oo in order to decide ^(x„|x<„) —>/i(x„|x<„). We need the following 
posterior representation of 


^(l|xi:„) = ^ w®/ie(l|xi,„), 
eee 


:= 


1^6 (^l:n) 


'^9 

^00 


i:<=i (11) 

eee 
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The ratio He /can be represented as follows: 


Heo{xi,n) 


5 ) \ 4 / \ _Q \ 1-4 

To) 


Qn[D{dn\\eo)-D{dn\\d)] 

( 12 ) 


where T)( 0 | | 0 ) = 0 In | + ( 1 - 0 ) In 

is the relative entropy between 9 and 0, which is continnous in 0 and 0, and is 0 if 
and only if 0 = 0. We also need the following implication for sets ffC©: 

If Wn —^ ^^cl gein) < c\/9 

then ^ w;®/i 0 (l|a;i:n) < XI (13) 

6*60 0eS7 

which easily follows from boundedness Y.e'X’n — 1 ^^*1 — 1 |Hntn4[ Lem.5.28 m]. We 

now prove Theorem^] We leave the special considerations necessary when 0,1 G0 
to the reader and assume, henceforth, 0 , 1 ^ 0 . 

(i) Let 0 be a countable dense subset of (0,1) and xi:oo be /i/^-random. Using 
m and m in m for 0 G 0 to be determined later we can bound 

^n[D{L\\eo)-D{L\\e)] ^ ^ ^ ^ ^ 

Hdo^Xi.n) We 

Let us assume that 9 = 9n-TT- This implies that there exists a cluster point 07^00 
of sequence 0 „, i.e. 0 ^ is infinitely often in an ^-neighborhood of 0 , e.g. D{T\\9) <s 
for inhnitely many n. 6 & [0,1] may be outside 0. Since this implies that 0„ 

must be “far” away from 6q infinitely often. For instance, for £ = 1(0 —0o)^, using 
Il(0||0)-|-Zl(0||0o) > (0 —0o)^, we get Zl(0||0o) > 3£. We now choose 0 G 0 so near 
to 0 such that |Z1(0||0) —Z1(0||0)| <e (here we use denseness of 0). Chaining all 
inequalities we get Z1(0| |0o)—Z1(0| |0) > 3£ —£ — £ = £> 0. This, together with (IT^ 
implies e"^ < c for infinitely many n which is impossible. Hence, the assumption 
T T T was wrong. 

Now, 0n —^00 implies that for arbitrary 6^6q, 0G0 and for sufficiently large n 
there exists > 0 such that Z 1 ( 0 „| | 0 ) > 25^ (since Zl( 0 o| | 0 ) 7 ^ 0 ) and T*( 0 n| | 0 o) < 
This implies 

yjS < ^gn[D(4||eoFU(4||e)] < ^g-n<Se g 

where we have used m and m in the hrst inequality and the second inequality 
holds for sufficiently large n. Hence J2e^eo'^n^^ by and >1 by normaliza¬ 
tion (HH), which hnally gives 

^(l|Xi:n) = w//+ X 

e^do 
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(ii) We first consider the case 0 = {9q,9i}: Let us choose 9 (= 
ln(Y^)/ln(|^^ 3 |^) ^0) in the (KL) middle of 9q and 6 *i such that 

D{9\\9o) = D{9\\9,), 0<9o<9<9,<l, (15) 


and choose Xi:oo such that 9^ := ^ satishes \9n — 6^1 < ^ 9n 9) 

We will show that Xi:oo is /igo/^-random and /^-random. Obviously no ^ can 
converge to 9q and 9i, thus proving Af-non-convergence. (xi:oo is obviously not 
M.L.-random, since the relative frequency 0n7^6*o/i. a:i;oo is not even /ig M.L.- 

random, since 9n converges too fast (~;^)- is indeed very regular, whereas ^ 
of a truly fiQ M.L.-random sequence has fluctuations of the order I/a/u. The fast 
convergence is necessary for doubly /x/^-randomness. The reason that xi-^oo is /x/^- 
random, but not M.L.-random is that /x/ALandomness is a weaker concept than 
M.L.-randomness for A4 C AilZum- Only regularities characterized by z/ G Af are 
recognized by /x/^-randomness.) 

In the following we assume that n is sufficiently large such that 9o<9n<9i. We 
need 


\D{9\\9) - D{9\\9)\ < c\9 - 9\ y9,9,9 e [9o,9i] with c := ln||^ < cx) (16) 
which follows for 9>9 (similarly 9<9) from 


D(«| |«) - C(«| |«) = ypin I - In !^]d(r < j_ [In - In = c- (» - ») 

where we have increased 9' to 9i and decreased 9 to 9o in the inequality. Using 
in m twice we get 




^ gn[D(4||6>o)-U(4|lei)] <; ^n[D{§\\eo)+c\L-e\-D{B\\ei)+c\L-9\] ^ g2c 


where we have used m in the last inequality. Now, m and m lead to 

= [1 + > [1 + =, c, > 0, (18) 

which shows that Xi:oo is /xg^/^-random by (ITn|l . Exchanging 9q^9i in m and m 
we similarly get >Ci > 0 , which implies (using = 1 ) 

^(l|a;i:„) = Wnh0(lkl:n) = IPn°-^0 + Wn -^1 7^ ^0 = h0o(lkLn)- (19) 

9g{9o,9i} 


This shows ^(l|a;i;„) /x(l|xi:„). One can show that .^(l|a;i;„) does not only not 

converge to 9o (and 6 ^ 1 ), but that it does not converge at all. The fast convergence 
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demand \ 0 n — 9\ < ^ on xi:oo can be weakened to 9n<d + 0 {^)\/n and 9n>9 — 0 (^) 
for infinitely many n, then Xi:oo is still /^-random, and >c [>0 for infinitely 
many n, which is sufficient to prove 

We now consider general 0 with gap in the sense that there exist 0 < 6^0 < 
6*1 < 1 with [6*o,6*i]n© = {6*o,6*i}: We show that all 6*7^6*o,6*i give asymptotically no 
contribution to i.e. (fTn|) still applies. Let 6*G0\{6*o,6*i}; all other dehnitions 

as before. Then 60 : = D{9\\9)—D{9\\9o/i) > 0 , since 9 is farther than 6 *o/i away from 
9 {\9 — 9\ > \9o/i — 9\). Similarly to (fl^ with 9 instead 6 *i we get 

^ gn[D(4||0o)-U(A|le)] < g2c.gn[D(0||6lo)-U(0||0)] ^ Q 

P '00 (^l:n) 

Hence ^ 0 from (HT)) and : = Eeee\{ 0 o, 0 i}^nP 0 (lkun)from 

m- Hence .^(l|xi:„) ■ 6 *i+£„ 7 ^ 6 *o = /xeo(l|a;i:„) for sufficiently large n, 

since —^ 0 , >c'l >0 and 9q^9i. □ 


9 Conclusions 

For a hierarchy of four computability dehnitions, we completed the classihca- 
tion of the existence of computable (semi)measures dominating all computable 
(semi)measures. Dominance is an important property of a prior, since it implies 
rapid convergence of the corresponding posterior with probability one. A strength¬ 
ening would be convergence for all Martin-Lof (M.L.) random sequences. This seems 
natural, since M.L. randomness can be dehned in terms of Solomonoff’s prior M, so 
there is a close connection. Contrary to what was believed before, the question of 
posterior convergence M//x—>1 for all M.L. random sequences is still open. Some ex¬ 
citing progress has been made recently in jHMOdj . partially answering this question. 
We introduced a new hexible notion of /x/^-randomness which contains Martin-Lof 
randomness as a special case. Though this notion may have a wider range of ap¬ 
plication, the main purpose for its introduction was to show that standard proof 
attempts of M//x 1 based on dominance only must fail. This follows from the 
derived result that the validity of ^//x —1 for /x/^-random sequences depends on the 
Bayes mixture 
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